Skip to main content
Version: 0.17.x

Generate from image (Streamed)

POST 

/image_generate_stream

Generate from image (Streamed)

The /image_generate_stream endpoint is used to communicate with the LLM. Use this endpoint when you want to send an image to a multimodal LLM, and receive a stream of responses from the LLM, token by token. If you want your response to be returned all at once, see the /image_generate endpoint.

This endpoint takes a multipart input, with two required fields:

  1. 'json_data': should contain json data, matching the format used for the /generate and /generate_stream endpoints.
  2. 'image_data': a stream of bytes, representing an image file.

Multipart requests support is built into most common HTTP clients.

To send a batch of requests with the same image, the text field of the json payload can be either a string, or an array of strings. Only one image can be supplied per request - to supply a set of generation requests each to different images, send them in quick succession and rely on automatic batching.

The response is a stream of server sent events, where each event is a token generated by the LLM. If you've supplied a batch of inputs:

{
"text": ["1 2 3 4", "a b c d"]
}

The server sent events data fields will be a stream of json payloads, with each payload having a text field containing the token, and a batch_id field containing the index of the batch that the token belongs to.

data:{"text": "5", "batch_id": 0}

data:{"text": "e", "batch_id": 1}

data:{"text": "6", "batch_id": 0}

data:{"text": "f", "batch_id": 1}

The specific order of the batch_ids of the returned tokens is not guaranteed.

Request​

Body

required

    image_data binaryrequired

    json_data

    object

    required

    JSON generation payload, used in /generate, /generate_stream, /image_generate, /image_generate_stream

    consumer_group stringnullable
    json_schema nullable
    max_new_tokens int64nullable
    min_new_tokens int64nullable
    no_repeat_ngram_size int64nullable
    prompt_max_tokens int64nullable
    regex_string stringnullable
    repetition_penalty floatnullable
    sampling_temperature floatnullable
    sampling_topk int64nullable
    sampling_topp floatnullable

    text

    object

    required

    oneOf

    string

Responses​

Takes in a JSON payload and returns the response token by token, as a stream of server sent events.

Schema

    text

    object

    required

    oneOf

    string

Loading...